High-resolution genomic analysis to investigate the impact of the invasive brushtail possum (Trichosurus vulpecula) and other wildlife on microbial water quality assessments

Escherichia coli are routine indicators of fecal contamination in water quality assessments. Contrary to livestock and human activities, brushtail possums (Trichosurus vulpecula), common invasive marsupials in Aotearoa/New Zealand, have not been thoroughly studied as a source of fecal contamination in freshwater. To investigate their potential role, Escherichia spp. isolates (n = 420) were recovered from possum gut contents and feces and were compared to those from water, soil, sediment, and periphyton samples, and from birds and other introduced mammals collected within the Mākirikiri Reserve, Dannevirke. Isolates were characterized using E. coli-specific real-time PCR targeting the uidA gene, Sanger sequencing of a partial gnd PCR product to generate a gnd sequence type (gST), and for 101 isolates, whole genome sequencing. Escherichia populations from 106 animal and environmental sample enrichments were analyzed using gnd metabarcoding. The alpha diversity of Escherichia gSTs was significantly lower in possums and animals compared with aquatic environmental samples, and some gSTs were shared between sample types, e.g., gST535 (in 85% of samples) and gST258 (71%). Forty percent of isolates gnd-typed and 75% of reads obtained by metabarcoding had gSTs shared between possums, other animals, and the environment. Core-genome single nucleotide polymorphism (SNP) analysis showed limited variation between several animal and environmental isolates (<10 SNPs). Our data show at an unprecedented scale that Escherichia clones are shared between possums, other wildlife, water, and the wider environment. These findings support the potential role of possums as contributors to fecal contamination in Aotearoa/New Zealand freshwater. Our study deepens the current knowledge of Escherichia populations in under-sampled wildlife. It presents a successful application of high-resolution genomic methods for fecal source tracking, thereby broadening the analytical toolbox available to water quality managers. Phylogenetic analysis of isolates and profiling of Escherichia populations provided useful information on the source(s) of fecal contamination and suggest that comprehensive invasive species management strategies may assist in restoring not only ecosystem health but also water health where microbial water quality is compromised.

SAMN36426751 to 36426856 for metabarcoding) in the form of forward and reverse raw reads fastq files." The links included in the text will become active once the article has been accepted.Furthermore, for reviewers to be able to see the data before acceptance, a confidential link to reviewers has been provided with the initial submission to access the BioProject.

Figure Copyright Compliance:
We have re-created Figure 1 using QGIS and Creative Commons Attribution License (CC BY 4.0) background imagery from LINZ, which aligns with PLOS ONE's copyright guidelines.Attribution to the source has been added to the caption with appropriate links according to LINZ guidelines: Background aerial imagery sourced from Toitū Te Whenua LINZ CC BY 4.0 Imagery Basemap contributors.This image was checked through PACE as suggested, as the previous images had.

Reference List Review and Retracted Papers:
We have thoroughly reviewed our reference list to ensure its completeness and correctness.We confirm that our reference list contains only current and relevant references, with no citations to retracted papers.No references have been added in our manuscript since the initial submission.

Reviewer #1:
(NB: the line numbers indicated below refer to the initial submission, not the amended manuscript) 1. Line 62-63: Please insert "an".Considerable resources are spent in an attempt to control possum The sentence has been revised as suggested.It now reads, " Considerable resources are spent in an attempt to control possum numbers [...]" 2. Line 123: The "of" sentence should be changed to "as".We are unsure what the reviewer means by "The "of" sentence should be changed to "as"."The only "of" present in line 123 cannot be replaced by "as".The sentence has been revised for clarity nonetheless.It now reads "The geometric mean of the Most Probable Number (MPN) of E. coli/100 mL of water was 317 (geometric standard deviation factor ×/÷ 1.11) at the Mākirikiri sampling point and 487 (×/÷ 1.51) at the Confluence." 3. Line 252: Since alpha diversity is used to established the diversity within a sample, I am thinking that the statement "The difference in alpha diversity between water and other sample types was significant at the 0.05 level, except for soil (p = 0.789) and ship rats (p = 0.074, Table 2)" is not reflecting the actual meaning and should be rephrased.This is a suggestion for your consideration "The difference in the alpha diversity of water and other samples types was significant at 0.05 level, except for soil (p = 0.789) and ship rats (p = 0.074, Table 2)."We appreciate the reviewer's suggestion, and the sentence has been rephrased as follows: "the difference in the alpha diversity of water and other sample types was significant at the 0.05 level, except for soil (p = 0.789) and ship rats (p = 0.074, Additionally, I would like the author to provide further explanation in their materials and method section how they determined the association between the ST and gST.
We apologize for any confusion.We have rephrased the sentence as follows: " Sequence types ST681 and ST11707, found in gST535 isolates in this study and previous work [34], were not found in human cases notified between 2019 and 2021 in New Zealand, suggesting this strain has limited public health implications."The word 'association' may not have been the proper term to describe the sequence types found in gST535 isolates and has been removed.We did not perform any specific analyses to determine an association between ST and gST and therefore did not modify the material and method section.
9. Line 355: The sentence has been updated to include "that": "indicated that New Zealand strains did not cluster with isolates recovered in other countries." Reviewer #2: 1. Abstract (L30-32): It would be good to include some more specific details of sharing between possums and environmental samples in the abstract.What proportion of environmental isolates were also detected in possums?What proportion of possum isolates were shared with other animals?We appreciate the reviewer's suggestion and have added more specific details to the abstract regarding the proportion of gSTs shared: "Forty percent of isolates gnd-typed and 75% of reads obtained by metabarcoding had gSTs shared between possums, other animals, and the environment".As this information was not present in the main text, we also added this information in the results section, as well as an extra Venn diagram offering more details in the supporting information.

Results (L128): please provide a brief description of the culture based methods used.
A brief description of the culture-based methods used has been provided as requested.The sentence now reads: "A total of 420 isolates of presumptive E. coli were obtained from 105/106 samples (99.1%) following enrichment in EC broth and subculturing onto CHROMAgar ECC, an E. coli selective medium".
3. Results (L142): clarify that the 55% refers to the two gSTs combined.The text has been clarified to indicate that the 55% refers to the combined proportion of the two gSTs.It now reads "29 isolates each, in total 55% of fecal and gut possum isolates gnd-typed".

Results (L146-149):
So the most abundant gSTs in possums were not detected at high rates in the environment and other animals but were present.A version of this would be good to include in the abstract.We apologize for any confusion.The two most abundant gSTs isolated from possums were gST535 and gST258.To clarify, gST535 was the second most frequent gST isolated in environmental samples.It was not clearly written in the text, and we have now added this information, but gST258 was the most abundant gST isolated from other animals.So, it wouldn't be true to say that "the most abundant gSTs in possums were not detected at high rates in the environment and other animals but were present".We want to emphasize that given the relatively small number of isolates gnd-typed from each environmental sample and the high alphadiversity identified by metabarcoding in these samples, the resolution of the culture-based method is too low to offer a good picture of all gSTs present and make meaningful inference of what is/isn't shared between sample types.
5. Results (L170): why was the Clermont method of E. coli phylogenetic grouping not conducted on all isolates to give a more representative picture of the phylogroup distributions between the sample types?This could easily be done on the 420 isolates -well at least those uidA positive.We appreciate the suggestion.The main rationale for using the uidA PCR was for the preliminary identification of cryptic clades vs. E. coli.When the study was designed, we decided to not use the phylogroup level (and hence the Clermont PCR), as information on the phylogroup does not provide enough power for making concrete conclusions on what is present in environmental vs. animal samples.We studied and compared the diversity of Escherichia populations between sample types at the gST level, offering a better resolution than phylogroups, and we confirmed the presence of clonal strains between sample types using the highest possible resolution, i.e., WGS.The information on phylogroups inferred in silico for the isolates that were submitted to WGS was presented as it is useful information, but this information is not central to answering the research question.
6. Results (L172): How did these correspond to the gSTs?The correspondence between phylogroups and gSTs has been further explained in the manuscript, bearing in mind other comments from reviewer #1 (cf.comment #8).
7. Results (L250-251): Please present the statistical significance of these comparisons.We acknowledge the potential for confusion in our original presentation.
The statistical significance of the comparisons appeared in the following sentence.We moved the reference to tables and figures at the end of this second sentence and added a linking word at the start to avoid any misunderstanding: "Despite the higher number of reads obtained from possums compared to other sample types, the alpha diversity (Shannon index) was higher in periphyton, water and sediment than in animal sources including possums.Indeed, the difference in the alpha diversity of water and other sample types was significant at the 0.05 level, except for soil (p = 0.789) and ship rats (p = 0.074, 8. Discussion (L273): Prevalence yes, but these gSTs did not appear to be at high abundance in the environmental samples according to this statement from the results: "The three most frequent gSTs were gST152 (five isolates, 7.8% of environmental 145 isolates), gST535 and gST587 (both four isolates, each 6.3% of environmental isolates respectively)."Regarding the reviewer's concern that the gSTs shared between animal and environmental samples did not appear to be at high abundance in the environmental samples, we wonder if the reviewer meant relative abundance rather than (absolute) abundance.As mentioned in the reviewer #2 point 4 of this letter, the high alpha-diversity in environmental samples means there is a big denominator for the total number of gSTs identified in this type of sample, and hence a small relative abundance.In terms of absolute abundance, these gSTs were among the most frequent gSTs detected in the environment.We checked the ASV table in the metabarcoding dataset and the abundance of gSTs in terms of reads confirms this, with gST535 the 4 th and 1 st most abundant gST in environmental and other animal samples, respectively, and gST258 the 26 th and 8 th most abundant (out of 568 gSTs).We added a sentence to emphasize this point: "Despite a low relative abundance of those shared gSTs in environmental samples, linked to the high alpha-diversity in this type of samples, those shared gSTs were among the most frequently detected." 9. Discussion (L298): suggest changing "confirm" to "show" or similar.At present, this study does not provide any evidence that removing possums would increase water quality.Could the authors do an analysis to assess this?If you assume all gST535 and maybe gST258 isolates in the water were from possums, then how much the E.coli concentrations in the water be reduced if they were removed?We have rephrased the sentence to indicate another study "would be needed to confirm" if sustained pest control programs can improve water quality.Given the enrichment step in our methodology, we think using our data to give a quantitative estimate of water quality improvement would remain speculative and a specific study design should be implemented to answer this question without bias.
10. Discussion (L302): Which analysis showed that there was no difference in community composition between sample types?This seems surprising and I'd like to see a permanova test to confirm that as well as a PCoA of the Bray Curtis distances.We apologize for any confusion.The analysis mentioned there refers to the test mentioned L560-562 in the methods section: "The null hypothesis of equal median measured relative abundance across sample types was tested with the testBetaDiversity() function from the DivNet package with a diagonal design matrix using a bootstrapped pseudo-F test (10,000 iterations)."and L253-257 in the results: "As regards to the beta diversity, we failed to reject the null hypothesis of equal median measured relative abundance across sample types at the 0.05 level (bootstrap p-value = 0.714).In other words, despite lower alpha diversities in animal compared to environmental samples, we did not detect a significant difference in measured beta diversity (Bray-Curtis distances) between sample types (S4 Fig) ".
We had done a PCA during the exploratory analysis of the data, not a PCoA.Both this PCA and the PCoA as well as the permanova test requested by reviewer #2 are presented below.As Reviewer #2 expected, the permanova detected significant beta diversity clustering of samples between sample types, but with non-homogenous dispersion between sample types: NB: When using the source (Environmental/Animal) as an independent factor rather than the sample type, the permanova also detected significant beta diversity clustering of samples between animal/environmental samples, but homogenous dispersion between animal/environmental samples, maybe because of the difference in dispersion between possums vs. other animal species and soil vs. other environmental samples.The betadisper plot is shown below: The PCA and PCoA give similar results and identify a broad variation in distances among possums, scattered in a triangle (corresponding to the three main gSTs as shown in the biplot below).The variation in distances between samples composition is driven by possums, which can be explained by the presence of one of the dominating gSTs exclusive of other dominant gSTs in a given sample, coupled with a very low alphadiversity in this species.It is unclear whether the dominance of one specific gST is real or artificially obtained due to the enrichment step allowing preferential growth of one strain.The first option is likely, as an analogous longitudinal study of commensal E. coli strains in an Australian population of mountain brushtail possums (Trichosurus cunninghami) also showed in that species, without pre-enrichment, a very low alphadiversity and an average of 2.2 strains per possum, with changes in the main strain over sampling occasions [ref 40 in the manuscript].A mention to this study has been added to the discussion.Indeed, in both the PCoA/permanova and the DivNet approaches, the beta diversity is estimated using as metrics the Bray-Curtis distance (BCD), but there is a difference in how those distances are estimated and how the hypothesis of a difference is tested between groups.
In PCoA, BCD are calculated at the sample level, and a permanova (adonis2 in vegan) is then testing whether the 'average' community composition is similar between the different groups (in this case the different animal and environmental sample types).
With the DivNet approach, BCD are estimated at the sample type level (this means within a sample type, the Bray-Curtis distance will be zero), accounting for unobserved taxa.The testBetaDiversity() function then "uses output from DivNet() to estimate community centroids within groups defined by the groups argument [in this case the different animal and environmental sample types] and test a null hypothesis of equality of all group centroids against a general alternative" (https://rdrr.io/github/adw96/DivNet/man/testBetaDiversity.html).
As we were interested in comparing communities between the types of environments/animals species estimated thanks to available samples, rather than comparing communities between these samples, we consider it more appropriate and meaningful to describe the results based on the DivNet model (lack of evidence of a difference in estimated beta-diversity between habitats sampled), rather than the PCoA/permanova approach (difference in beta-diversity between observed samples of different types that may not be meaningful outside of our study).We also invite the reviewer to see the comment on this subject from the statistician who created DivNet at this link: https://github.com/adw96/DivNet/issues/13#issuecomment-411152644.She considers permanova inadequate for hypothesis-testing.For this reason, we answered the reviewer's query in this letter, but do not feel it would be suitable to present the PCoA/permanova results in the manuscript.
We have made mention of the use of the DivNet approach earlier in the results and discussion (and not just the methods section) to avoid any confusion, and have carefully rephrased the results ("did not detect a difference" rather than "there was no difference").We have also modified the sentence in the discussion to more accurately discuss our results.The sentence now reads: "but we failed to find evidence for a difference in measured community composition between sample types."11. Discussion (L229-340): This is a good point and I think any abundance measures from this data should be interpreted with caution.However, I do think that they have some value where there are big differences between sample types, which might be worth discussing with the appropriate caveats.Did the reviewer mean L339-340?We have modified the end of this paragraph to discuss with the appropriate caveats the main findings in terms of abundance of reads and isolates across sample types.
12. Discussion (L353): But you just said monitoring is limited, so this statement can not be made with certainty.There is indeed no certainty on the public health impact of gST535 and we used the verb "suggesting" in this sentence.We have modified the following sentence to underline this remains a hypothesis, it now reads: "[…] not found in human cases notified between 2019 and 2021 in New Zealand, suggesting this strain has limited public health implications.In support of this hypothesis, a phylogenetic tree […]" 13. Discussion (L421): I think this is overstated.There needs to be evidence that possum gSTs are contributing to high E.coli loads in water before this statement can be made.The statement has been adjusted to reflect the need for further evidence before making assertions about the impact pest control measures could have on water quality.The sentence now reads: "It is therefore realistic to hypothesize that the removal of invasive pests from the environment may also improve microbial water quality assessments in addition to enhancing endemic biodiversity."

Figure 1 :
Figure 1: Principal Component Analysis of animal and environmental samples submitted to gnd metabarcoding (log transformed data).

Figure 2 :
Figure 2: Principal Coordinates Analysis of animal and environmental samples submitted to gnd metabarcoding.A Lingoes transformation was used as there were negative eigenvalues.

Figure 4 :
Figure 4: Biplot for the PCA of gSTs identified in possum (Tv), other wild species (Mf, Ee, Rr, C), and environmental samples (C, E, G, J)

261: As you indicated in parenthesis some gSTs were previously found in E. marmotae
an E. ruysiae and E. whittamii, I kindly want to find out if these works have been published?If yes kindly included the reference.We apologize for any confusion.The isolates of E. marmotae, E. ruysiae, and E. whittamii mentioned were references to gSTs identified only in those species in the gndDb.The text has been revised to clarify this point with mention of the publication, and now reads: "Out

347: Please insert "that". "Suggesting that these virulence factors may offer advantages for survival" The
sentence has been revised to include "that".8.